GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords by GopalakrishnanN · Pull Request #28302 · microsoft/onnxruntime

GopalakrishnanN · 2026-04-30T23:39:55Z

Description

Hardens the CPU GridSample operator against undefined behavior caused by static_cast<int64_t> of NaN, ±Inf, or out-of-range floating-point coordinates. Tracks IcM 31000000575970.

Motivation and Context

GridSample previously fed denormalized grid coordinates directly into integer casts (std::floor, std::nearbyint) inside its 2D and 3D Compute paths, the bilinear fast path, and GsReflect. With pathological grid values (NaN, ±Inf, magnitudes beyond INT64_MAX), the casts invoke C++ undefined behavior. On some toolchains/sanitizers this corrupts indices, leading to out-of-bounds reads (a security concern) and on most platforms it just silently returns garbage indices that PixelAtGrid later clamps.

Changes

`onnxruntime/core/providers/cpu/tensor/grid_sample.cc`

GsReflect: explicit guard that returns x_min when the input is non-finite or range <= 0 (covers align_corners=true with a 1×1 image where reflection range collapses to zero).
New helper IsSafeForInt64Conversion<T>(v): rejects NaN/Inf and any magnitude > 2⁶².
2D Compute loop: sanitizes unsafe x/y to x_min/y_min before all static_cast<int64_t> calls (Nearest, Linear, Cubic).
3D Compute loop: same sanitization for x/y/z.
PrecomputeBilinearSamplePlan2D (bilinear fast path): substitutes -0.5 for unsafe coords; the existing mask logic correctly rejects the resulting out-of-range neighbors.
Adds <algorithm> and <cmath> includes.

`onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc`

New standalone TU (renamed from a previously included .inc) with MIT copyright header.
Adds 7 regression tests across <float, MLFloat16>:
- NaN coordinates (nearest + reflection)
- ±Inf coordinates (bilinear + reflection)
- Extreme finite coordinates (bilinear + reflection)
- Cubic + reflection with extreme coordinates (float-only by ONNX type constraint)
- 3D / 5-D nearest + reflection with extreme coordinates
- Bilinear + border with extreme coordinates
- 1×1 image + align_corners=1 (zero reflection range)

`cmake/onnxruntime_unittests.cmake`

Adds the new .cc test to the Emscripten exclusion list alongside the existing grid_sample_test.cc.

Validation

Local Windows Release build: cmake --build . --target onnxruntime_provider_test.
All *GridSample* tests pass: 116 (96 generated + 8 contrib op + 12 custom across float/MLFloat16).
Other unrelated test failures observed in the full provider test run are pre-existing and unrelated to GridSample.

…inates Extreme grid coordinates could produce out-of-range indices in the reflection branch of PixelAtGrid. Two minimal changes: 1. In GsReflect, widen the int n = static_cast<int>(dx / range) computation to int64_t. For large (but finite) inputs, dx / range can exceed INT_MAX, making the narrow int cast undefined behavior. 2. In PixelAtGrid / PixelAtGrid3D, clamp the int64_t index returned by GsReflect back into [0, W-1] / [0, H-1] / [0, D-1] before indexing the image buffer, guaranteeing the final load stays in bounds. Adds a regression test covering reflection padding with large-magnitude grid coordinates.

… CPU Addresses yuslepukhin's review comments on PR #27975: - GsReflect: add explicit guard for non-finite input or non-positive range (returns x_min) before the existing reduction logic so the int64 cast inside GsReflect is never invoked on NaN/Inf or with range == 0. - Introduce IsSafeForInt64Conversion<T>() helper and sanitize unsafe coordinates to the in-range border value (x_min/y_min/z_min) before the float->int64 conversions in: * GridSample<T>::Compute 2D loop (Nearest/Linear/Cubic), * GridSample<T>::Compute 3D loop, * PrecomputeBilinearSamplePlan2D fast path (substitutes -0.5 so the mask logic continues to reject the out-of-range neighbors). This eliminates undefined behavior from static_cast<int64_t> on values outside [INT64_MIN, INT64_MAX] or NaN/Inf. - Add <algorithm> and <cmath> includes. - Expand custom tests to cover NaN, +/-Inf, extreme finite coords across bilinear/cubic + reflection, 3D nearest + reflection, bilinear + border, and dim=1 with align_corners=1 (range == 0 in GsReflect).

Promote the custom GridSample tests from a header-style .inc included into grid_sample_test.cc to a standalone translation unit: - Add MIT copyright header, includes, and namespace wrapping. - Move GetExecutionProviders/RunTests helpers into an anonymous namespace inside the new file (separate internal-linkage copies, so there is no ODR conflict with the versions in grid_sample_test.cc). - Remove the #include from grid_sample_test.cc. - Add the new file to the Emscripten exclusion list in cmake/onnxruntime_unittests.cmake alongside grid_sample_test.cc.

The pathological-coordinate regression tests (NaN/Inf/extreme grid values) target CPU-side hardening. Running them against non-CPU EPs (CUDA/CoreML/iOS/WebGPU) introduces unrelated backend behavior and broad provider-test failure cascades. Restrict those tests to the CPU EP via a new RunCpuOnly helper. The cubic test is also converted to a non-typed TEST over float only, since ONNX GridSample cubic mode constrains T1 to float (the MLFloat16 instantiation was a no-op). The two boundary-condition tests (linear_zeros_mixed_bounds_*) continue to run on all available EPs as cross-EP correctness checks.

yuslepukhin · 2026-05-01T17:28:02Z

Gaps in test coverage:

zeros padding + extreme/NaN/Inf coordinates: Not tested. While zeros mode is safe by design, a regression test would confirm the IsSafeForInt64Conversion sanitization doesn't break it.
cubic + NaN/Inf: Only extreme-finite is tested for cubic. NaN and Inf paths through cubic interpolation are not covered.
3D (5D) + linear mode + extreme coordinates: Only nearest mode is tested for 3D. The 3D trilinear interpolation path with extreme coords is not covered.
3D + NaN/Inf coordinates: Not tested at all for the 3D path.
align_corners=1 with non-degenerate images + extreme coordinates: Only the degenerate 1×1 case is tested with align_corners=1.
Negative extreme coordinates asymmetry: Most tests use symmetric positive/negative extremes. A test with only-negative extreme coordinates in a non-constant image would verify correct reflection direction.
Mixed normal + pathological coordinates in the same grid: All pathological tests use constant-valued images (all 1.0), making the expected output trivially deterministic. A test with a non-constant image and a mix of normal + pathological grid points would validate that normal sampling is unaffected by the sanitization.

yuslepukhin · 2026-05-01T17:30:22Z

CUDA is not fixed.

The CUDA implementation in grid_sample_impl.cu has the exact same vulnerability (this is a contrib op under onnxruntime::contrib::cuda):

Additionally:

No NaN/Inf guard in GsReflect
No safety clamp in PixelAtGrid after the reflection branch
No IsSafeForInt64Conversion check before static_cast<int64_t> calls in the compute kernels
The CUDA implementation needs an equivalent fix. While GPU UB is less dangerous than CPU UB (no ASAN-detectable heap corruption), it still produces garbage output silently and can crash the GPU context.

WebGPU/JS implementations are not vulnerable — WGSL integer conversions have defined saturation semantics.

File a follow-up for CUDA.

Add 7 new CPU-only regression tests for the float->int64 cast hardening: - linear+zeros+extreme/NaN/Inf: confirms IsSafeForInt64Conversion does not break the bilinear zeros-padding fast path. - cubic+NaN/Inf: covers NaN/+Inf/-Inf paths through the cubic kernel (float-only). - 5D linear+extreme: 3D trilinear path with extreme finite coordinates. - 5D nearest+NaN/Inf: 3D path with non-finite values along different axes. - align_corners=1 non-degenerate (3x3) image with extreme/NaN coords: verifies sanitization to x_min=0 maps to pixel(0,0). - negative-only extreme on non-constant image: verifies sanitization is direction-agnostic and rounds to pixel(0,0). - mixed normal+pathological grid in one call on a non-constant image: verifies normal points sample correctly and only unsafe coords are sanitized.

github-actions

You can commit the suggested changes from lintrunner.

GopalakrishnanN · 2026-05-01T19:59:08Z

Work item to cover the same vulnerability in CUDA - https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/93864

yuslepukhin

LGTM

Gopalakrishnan Nallasamy added 3 commits April 30, 2026 14:07

GopalakrishnanN requested a review from yuslepukhin May 1, 2026 00:41

yuslepukhin reviewed May 1, 2026

View reviewed changes

Comment thread cmake/onnxruntime_unittests.cmake

github-actions Bot reviewed May 1, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc Outdated

github-advanced-security AI found potential problems May 1, 2026

View reviewed changes

Comment thread onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc Fixed

Apply clang-format to grid_sample_test_custom.cc

6229b12

yuslepukhin approved these changes May 2, 2026

View reviewed changes

yuslepukhin merged commit 55c5c82 into main May 2, 2026
89 checks passed

yuslepukhin deleted the fix/IntegerOverFlow-IcM31000000575970 branch May 2, 2026 00:20

BrewTestBot mentioned this pull request May 8, 2026

onnxruntime 1.26.0 Homebrew/homebrew-core#281672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords#28302

GridSample: harden float->int64 casts against NaN/Inf/out-of-range coords#28302
yuslepukhin merged 6 commits intomainfrom
fix/IntegerOverFlow-IcM31000000575970

GopalakrishnanN commented Apr 30, 2026

Uh oh!

yuslepukhin commented May 1, 2026

Uh oh!

yuslepukhin commented May 1, 2026

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

GopalakrishnanN commented May 1, 2026

Uh oh!

yuslepukhin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

GopalakrishnanN commented Apr 30, 2026

Description

Motivation and Context

Changes

onnxruntime/core/providers/cpu/tensor/grid_sample.cc

onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc

cmake/onnxruntime_unittests.cmake

Validation

Related

Uh oh!

yuslepukhin commented May 1, 2026

Uh oh!

yuslepukhin commented May 1, 2026

CUDA is not fixed.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GopalakrishnanN commented May 1, 2026

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`onnxruntime/core/providers/cpu/tensor/grid_sample.cc`

`onnxruntime/test/providers/cpu/tensor/grid_sample_test_custom.cc`

`cmake/onnxruntime_unittests.cmake`